9 research outputs found

    JPEG steganography with particle swarm optimization accelerated by AVX

    Get PDF
    Digital steganography aims at hiding secret messages in digital data transmitted over insecure channels. The JPEG format is prevalent in digital communication, and images are often used as cover objects in digital steganography. Optimization methods can improve the properties of images with embedded secret but introduce additional computational complexity to their processing. AVX instructions available in modern CPUs are, in this work, used to accelerate data parallel operations that are part of image steganography with advanced optimizations.Web of Science328art. no. e544

    Inastemp: a novel intrinsics-as-template library for portable SIMD-vectorization

    No full text
    The development of scientific applications requires highly optimized computational kernels to benefit from modern hardware. In recent years, vectorization has gained key importance in exploiting the processing capabilities of modern CPUs, whose evolution is characterized by increasing register-widths and core numbers, but stagnating clock frequencies. In particular, vectorization allows floating point operations to be performed at a higher rate than the processor’s frequency. However, compilers often fail to vectorize complex codes and pure assembly/intrinsic implementations often suffer from software engineering issues, such as readability and maintainability. Moreover, it is difficult for domain scientists to write optimized code without technical support. To address these issues, we propose Inastemp, a lightweight open-source C++ library. Inastemp offers a solution to develop hardware-independent computational kernels for the CPU. These kernels are portable across compilers and floating point precision and vectorized targeting SSE(3,4.1,4.2), AVX(2), AVX512, or ALTIVEC/VMX instructions. Inastemp provides advanced features, such as an if-else statement that vectorizes branches that cannot be removed. Our performance study shows that Inastemp has the same efficiency as pure intrinsic approaches on modern architectures. As side-results, this study provides micro benchmarks on the latest HPC architectures for three different computational kernels, emphasizing comparisons between scalar and intrinsic-based codes

    Hierarchical Randomized Low-Rank Approximations: Applications to covariance kernel matrices and generation of Gaussian Random Fields

    No full text
    International audienceWe propose a new efficient algorithm for performing hierarchical kernel MVPs in O(N) operations called the Uniform FMM (UFMM), an FFT accelerated variant of the black-box FMM by Fong and Darve. The UFMM is used to speed-up randomized low-rank methods thus reducing their computational cost to O(N) in time and memory. Numerical benchmarks include low-rank approximations of covariance matrices for the simulation of stationary random fields on very large distributions of points

    Complexes++: Efficient and versatile coarse-grained simulations of protein complexes and their dense solutions

    No full text
    The interior of living cells is densely filled with proteins and their complexes, which perform multitudes of biological functions. We use coarse-grained simulations to reach the system sizes and time scales needed to study protein complexes and their dense solutions and to interpret experiments. To take full advantage of coarse-graining, the models have to be efficiently implemented in simulation engines that are easy to use, modify, and extend. Here, we introduce the Complexes++ simulation software to simulate a residue-level coarse-grained model for proteins and their complexes, applying a Markov chain Monte Carlo engine to sample configurations. We designed a parallelization scheme for the energy evaluation capable of simulating both dilute and dense systems efficiently. Additionally, we designed the software toolbox pycomplexes to easily set up complex topologies of multi-protein complexes and their solutions in different thermodynamic ensembles and in replica-exchange simulations, to grow flexible polypeptide structures connecting ordered protein domains, and to automatically visualize structural ensembles. Complexes++ simulations can easily be modified and they can be used for efficient explorations of different simulation systems and settings. Thus, the Complexes++ software is well suited for the integration of experimental data and for method development

    Certified Gathering of Oblivious Mobile Robots: survey of recent results and open problems

    No full text
    International audienceSwarms of mobile robots recently attracted the focus of the Distributed Computing community. One of the fundamental problems in this context is that of gathering the robots: the robots must joint a common location, not known beforehand. Despite its apparent simplicity, this problem proved quite hard to characterise fully, due to many model variants, leading to informal error-prone reasoning.Over the past few years, a significant effort permitted to set up a formal framework, relying on the Coq proof assistant, that was used to provide certified results related to the gathering problem. We survey the main abstractions that permit to reason about oblivious mobile robots that evolve in a bidirectional Euclidean space, the distributed executions they can perform, and the variants of the gathering problem they can solve, while certifying all obtained results. We also hint path the remaining steps to obtain a certified full characterisation of the problem

    Packet Efficient Implementation of the Omega Failure Detector

    No full text
    International audienceWe assume that a message may be delivered by packets through multiple hops and investigate the feasibility and efficiency of an Omega Failure Detector implementation. To motivate the study, we prove the existence and sustainability of a leader is exponentially more probable in a multi-hop than in a single-hop implementation. An implementation is: message efficient if all but finitely many messages are sent by a single process; packet efficient if it is message efficient and the number of packets used to transmit all but finitely many messages is proportional to the number of processes in the system; super packet efficient if it is message efficient and the number of channels used to transmit all but finitely many packets is proportional to the number of processes in the system. We prove that a super packet efficient implementation of Omega is impossible. We establish necessary conditions for the existence of a packet efficient implementation of Omega and present an algorithm that implements Omega under these conditions
    corecore